Overview

Dataset statistics

Number of variables29
Number of observations1949630
Missing cells16662616
Missing cells (%)29.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory431.4 MiB
Average record size in memory232.0 B

Variable types

Categorical20
Unsupported1
Numeric8

Alerts

CRASH DATE has a high cardinality: 3804 distinct valuesHigh cardinality
CRASH TIME has a high cardinality: 1440 distinct valuesHigh cardinality
LOCATION has a high cardinality: 262859 distinct valuesHigh cardinality
ON STREET NAME has a high cardinality: 17394 distinct valuesHigh cardinality
CROSS STREET NAME has a high cardinality: 19731 distinct valuesHigh cardinality
OFF STREET NAME has a high cardinality: 202042 distinct valuesHigh cardinality
CONTRIBUTING FACTOR VEHICLE 1 has a high cardinality: 61 distinct valuesHigh cardinality
CONTRIBUTING FACTOR VEHICLE 2 has a high cardinality: 61 distinct valuesHigh cardinality
CONTRIBUTING FACTOR VEHICLE 3 has a high cardinality: 51 distinct valuesHigh cardinality
VEHICLE TYPE CODE 1 has a high cardinality: 1450 distinct valuesHigh cardinality
VEHICLE TYPE CODE 2 has a high cardinality: 1622 distinct valuesHigh cardinality
VEHICLE TYPE CODE 3 has a high cardinality: 230 distinct valuesHigh cardinality
VEHICLE TYPE CODE 4 has a high cardinality: 91 distinct valuesHigh cardinality
VEHICLE TYPE CODE 5 has a high cardinality: 63 distinct valuesHigh cardinality
NUMBER OF PERSONS INJURED is highly overall correlated with NUMBER OF PEDESTRIANS INJURED and 1 other fieldsHigh correlation
NUMBER OF PERSONS KILLED is highly overall correlated with NUMBER OF PEDESTRIANS KILLED and 2 other fieldsHigh correlation
NUMBER OF MOTORIST INJURED is highly overall correlated with NUMBER OF PERSONS INJUREDHigh correlation
NUMBER OF MOTORIST KILLED is highly overall correlated with NUMBER OF PERSONS KILLEDHigh correlation
NUMBER OF PEDESTRIANS KILLED is highly overall correlated with NUMBER OF PERSONS KILLED and 1 other fieldsHigh correlation
NUMBER OF CYCLIST KILLED is highly overall correlated with NUMBER OF PERSONS KILLED and 1 other fieldsHigh correlation
CONTRIBUTING FACTOR VEHICLE 3 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 1 and 3 other fieldsHigh correlation
CONTRIBUTING FACTOR VEHICLE 4 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 1 and 3 other fieldsHigh correlation
CONTRIBUTING FACTOR VEHICLE 5 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 1 and 4 other fieldsHigh correlation
LATITUDE is highly overall correlated with LONGITUDEHigh correlation
LONGITUDE is highly overall correlated with LATITUDEHigh correlation
NUMBER OF PEDESTRIANS INJURED is highly overall correlated with NUMBER OF PERSONS INJUREDHigh correlation
NUMBER OF CYCLIST INJURED is highly overall correlated with VEHICLE TYPE CODE 4 and 1 other fieldsHigh correlation
CONTRIBUTING FACTOR VEHICLE 1 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 2 and 4 other fieldsHigh correlation
CONTRIBUTING FACTOR VEHICLE 2 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 1 and 3 other fieldsHigh correlation
COLLISION_ID is highly overall correlated with VEHICLE TYPE CODE 4 and 1 other fieldsHigh correlation
VEHICLE TYPE CODE 4 is highly overall correlated with NUMBER OF CYCLIST INJURED and 3 other fieldsHigh correlation
VEHICLE TYPE CODE 5 is highly overall correlated with NUMBER OF CYCLIST INJURED and 3 other fieldsHigh correlation
BOROUGH has 605465 (31.1%) missing valuesMissing
ZIP CODE has 605701 (31.1%) missing valuesMissing
LATITUDE has 224428 (11.5%) missing valuesMissing
LONGITUDE has 224428 (11.5%) missing valuesMissing
LOCATION has 224428 (11.5%) missing valuesMissing
ON STREET NAME has 405870 (20.8%) missing valuesMissing
CROSS STREET NAME has 719709 (36.9%) missing valuesMissing
OFF STREET NAME has 1636131 (83.9%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 2 has 291964 (15.0%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 3 has 1812944 (93.0%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 4 has 1919228 (98.4%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 5 has 1941480 (99.6%) missing valuesMissing
VEHICLE TYPE CODE 2 has 353893 (18.2%) missing valuesMissing
VEHICLE TYPE CODE 3 has 1817465 (93.2%) missing valuesMissing
VEHICLE TYPE CODE 4 has 1920206 (98.5%) missing valuesMissing
VEHICLE TYPE CODE 5 has 1941717 (99.6%) missing valuesMissing
LATITUDE is highly skewed (γ1 = -21.17608353)Skewed
NUMBER OF PERSONS KILLED is highly skewed (γ1 = 34.8813989)Skewed
NUMBER OF MOTORIST KILLED is highly skewed (γ1 = 55.83766352)Skewed
COLLISION_ID has unique valuesUnique
ZIP CODE is an unsupported type, check if it needs cleaning or further analysisUnsupported
NUMBER OF PERSONS INJURED has 1527469 (78.3%) zerosZeros
NUMBER OF PERSONS KILLED has 1946976 (99.9%) zerosZeros
NUMBER OF PEDESTRIANS INJURED has 1848975 (94.8%) zerosZeros
NUMBER OF MOTORIST INJURED has 1678914 (86.1%) zerosZeros
NUMBER OF MOTORIST KILLED has 1948606 (99.9%) zerosZeros

Reproduction

Analysis started2022-12-05 14:44:27.020096
Analysis finished2022-12-05 14:54:24.926646
Duration9 minutes and 57.91 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

CRASH DATE
Categorical

Distinct3804
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size14.9 MiB
01/21/2014
 
1161
11/15/2018
 
1065
12/15/2017
 
999
05/19/2017
 
974
01/18/2015
 
961
Other values (3799)
1944470 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters19496300
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row09/11/2021
2nd row03/26/2022
3rd row06/29/2022
4th row09/11/2021
5th row12/14/2021

Common Values

ValueCountFrequency (%)
01/21/2014 1161
 
0.1%
11/15/2018 1065
 
0.1%
12/15/2017 999
 
0.1%
05/19/2017 974
 
< 0.1%
01/18/2015 961
 
< 0.1%
02/03/2014 960
 
< 0.1%
03/06/2015 939
 
< 0.1%
05/18/2017 911
 
< 0.1%
01/07/2017 896
 
< 0.1%
03/02/2018 884
 
< 0.1%
Other values (3794) 1939880
99.5%

Length

2022-12-05T16:54:25.461038image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
01/21/2014 1161
 
0.1%
11/15/2018 1065
 
0.1%
12/15/2017 999
 
0.1%
05/19/2017 974
 
< 0.1%
01/18/2015 961
 
< 0.1%
02/03/2014 960
 
< 0.1%
03/06/2015 939
 
< 0.1%
05/18/2017 911
 
< 0.1%
01/07/2017 896
 
< 0.1%
03/02/2018 884
 
< 0.1%
Other values (3794) 1939880
99.5%

Most occurring characters

ValueCountFrequency (%)
0 4442574
22.8%
/ 3899260
20.0%
2 3582396
18.4%
1 3455960
17.7%
3 646954
 
3.3%
7 599064
 
3.1%
8 598135
 
3.1%
6 588171
 
3.0%
9 572814
 
2.9%
5 571573
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15597040
80.0%
Other Punctuation 3899260
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4442574
28.5%
2 3582396
23.0%
1 3455960
22.2%
3 646954
 
4.1%
7 599064
 
3.8%
8 598135
 
3.8%
6 588171
 
3.8%
9 572814
 
3.7%
5 571573
 
3.7%
4 539399
 
3.5%
Other Punctuation
ValueCountFrequency (%)
/ 3899260
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 19496300
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4442574
22.8%
/ 3899260
20.0%
2 3582396
18.4%
1 3455960
17.7%
3 646954
 
3.3%
7 599064
 
3.1%
8 598135
 
3.1%
6 588171
 
3.0%
9 572814
 
2.9%
5 571573
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19496300
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4442574
22.8%
/ 3899260
20.0%
2 3582396
18.4%
1 3455960
17.7%
3 646954
 
3.3%
7 599064
 
3.1%
8 598135
 
3.1%
6 588171
 
3.0%
9 572814
 
2.9%
5 571573
 
2.9%

CRASH TIME
Categorical

Distinct1440
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size14.9 MiB
16:00
 
27307
17:00
 
26755
15:00
 
26661
18:00
 
24709
14:00
 
24464
Other values (1435)
1819734 

Length

Max length5
Median length5
Mean length4.7403538
Min length4

Characters and Unicode

Total characters9241936
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2:39
2nd row11:45
3rd row6:55
4th row9:35
5th row8:13

Common Values

ValueCountFrequency (%)
16:00 27307
 
1.4%
17:00 26755
 
1.4%
15:00 26661
 
1.4%
18:00 24709
 
1.3%
14:00 24464
 
1.3%
13:00 22703
 
1.2%
9:00 20512
 
1.1%
12:00 20465
 
1.0%
19:00 20434
 
1.0%
16:30 19724
 
1.0%
Other values (1430) 1715896
88.0%

Length

2022-12-05T16:54:25.979714image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
16:00 27307
 
1.4%
17:00 26755
 
1.4%
15:00 26661
 
1.4%
18:00 24709
 
1.3%
14:00 24464
 
1.3%
13:00 22703
 
1.2%
9:00 20512
 
1.1%
12:00 20465
 
1.0%
19:00 20434
 
1.0%
16:30 19724
 
1.0%
Other values (1430) 1715896
88.0%

Most occurring characters

ValueCountFrequency (%)
: 1949630
21.1%
0 1803518
19.5%
1 1710640
18.5%
5 828730
9.0%
2 765579
 
8.3%
3 632346
 
6.8%
4 508060
 
5.5%
8 295346
 
3.2%
7 254463
 
2.8%
9 253914
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7292306
78.9%
Other Punctuation 1949630
 
21.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1803518
24.7%
1 1710640
23.5%
5 828730
11.4%
2 765579
10.5%
3 632346
 
8.7%
4 508060
 
7.0%
8 295346
 
4.1%
7 254463
 
3.5%
9 253914
 
3.5%
6 239710
 
3.3%
Other Punctuation
ValueCountFrequency (%)
: 1949630
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9241936
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
: 1949630
21.1%
0 1803518
19.5%
1 1710640
18.5%
5 828730
9.0%
2 765579
 
8.3%
3 632346
 
6.8%
4 508060
 
5.5%
8 295346
 
3.2%
7 254463
 
2.8%
9 253914
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9241936
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
: 1949630
21.1%
0 1803518
19.5%
1 1710640
18.5%
5 828730
9.0%
2 765579
 
8.3%
3 632346
 
6.8%
4 508060
 
5.5%
8 295346
 
3.2%
7 254463
 
2.8%
9 253914
 
2.7%

BOROUGH
Categorical

Distinct5
Distinct (%)< 0.1%
Missing605465
Missing (%)31.1%
Memory size14.9 MiB
BROOKLYN
424950 
QUEENS
360087 
MANHATTAN
305117 
BRONX
197581 
STATEN ISLAND
56430 

Length

Max length13
Median length9
Mean length7.4601481
Min length5

Characters and Unicode

Total characters10027670
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBROOKLYN
2nd rowBROOKLYN
3rd rowBRONX
4th rowBROOKLYN
5th rowMANHATTAN

Common Values

ValueCountFrequency (%)
BROOKLYN 424950
21.8%
QUEENS 360087
18.5%
MANHATTAN 305117
15.6%
BRONX 197581
 
10.1%
STATEN ISLAND 56430
 
2.9%
(Missing) 605465
31.1%

Length

2022-12-05T16:54:26.464151image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-05T16:54:27.111402image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn 424950
30.3%
queens 360087
25.7%
manhattan 305117
21.8%
bronx 197581
14.1%
staten 56430
 
4.0%
island 56430
 
4.0%

Most occurring characters

ValueCountFrequency (%)
N 1705712
17.0%
O 1047481
10.4%
A 1028211
10.3%
E 776604
 
7.7%
T 723094
 
7.2%
R 622531
 
6.2%
B 622531
 
6.2%
L 481380
 
4.8%
S 472947
 
4.7%
Y 424950
 
4.2%
Other values (9) 2122229
21.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 9971240
99.4%
Space Separator 56430
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 1705712
17.1%
O 1047481
10.5%
A 1028211
10.3%
E 776604
 
7.8%
T 723094
 
7.3%
R 622531
 
6.2%
B 622531
 
6.2%
L 481380
 
4.8%
S 472947
 
4.7%
Y 424950
 
4.3%
Other values (8) 2065799
20.7%
Space Separator
ValueCountFrequency (%)
56430
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9971240
99.4%
Common 56430
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 1705712
17.1%
O 1047481
10.5%
A 1028211
10.3%
E 776604
 
7.8%
T 723094
 
7.3%
R 622531
 
6.2%
B 622531
 
6.2%
L 481380
 
4.8%
S 472947
 
4.7%
Y 424950
 
4.3%
Other values (8) 2065799
20.7%
Common
ValueCountFrequency (%)
56430
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10027670
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 1705712
17.0%
O 1047481
10.4%
A 1028211
10.3%
E 776604
 
7.7%
T 723094
 
7.2%
R 622531
 
6.2%
B 622531
 
6.2%
L 481380
 
4.8%
S 472947
 
4.7%
Y 424950
 
4.2%
Other values (9) 2122229
21.2%

ZIP CODE
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing605701
Missing (%)31.1%
Memory size14.9 MiB

LATITUDE
Real number (ℝ)

HIGH CORRELATION
MISSING
SKEWED

Distinct124579
Distinct (%)7.2%
Missing224428
Missing (%)11.5%
Infinite0
Infinite (%)0.0%
Mean40.634495
Minimum0
Maximum43.344444
Zeros3802
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size14.9 MiB
2022-12-05T16:54:27.650052image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40.596817
Q140.66807
median40.721336
Q340.76943
95-th percentile40.861976
Maximum43.344444
Range43.344444
Interquartile range (IQR)0.10136

Descriptive statistics

Standard deviation1.9113294
Coefficient of variation (CV)0.047037113
Kurtosis447.20649
Mean40.634495
Median Absolute Deviation (MAD)0.051212
Skewness-21.176084
Sum70102713
Variance3.6531799
MonotonicityNot monotonic
2022-12-05T16:54:28.045509image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 3802
 
0.2%
40.861862 798
 
< 0.1%
40.696033 711
 
< 0.1%
40.8047 691
 
< 0.1%
40.608757 670
 
< 0.1%
40.798256 626
 
< 0.1%
40.759308 603
 
< 0.1%
40.6960346 587
 
< 0.1%
40.675735 505
 
< 0.1%
40.7606005 474
 
< 0.1%
Other values (124569) 1715735
88.0%
(Missing) 224428
 
11.5%
ValueCountFrequency (%)
0 3802
0.2%
30.78418 1
 
< 0.1%
34.783634 1
 
< 0.1%
40.4989488 2
 
< 0.1%
40.4991346 1
 
< 0.1%
40.49931 1
 
< 0.1%
40.4994787 1
 
< 0.1%
40.499659 1
 
< 0.1%
40.49971 1
 
< 0.1%
40.49984 1
 
< 0.1%
ValueCountFrequency (%)
43.344444 1
 
< 0.1%
42.64154 1
 
< 0.1%
42.318317 1
 
< 0.1%
42.107204 1
 
< 0.1%
41.91661 1
 
< 0.1%
41.34796 1
 
< 0.1%
41.258785 1
 
< 0.1%
41.12615 5
< 0.1%
41.12421 1
 
< 0.1%
41.061634 2
 
< 0.1%

LONGITUDE
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct97170
Distinct (%)5.6%
Missing224428
Missing (%)11.5%
Infinite0
Infinite (%)0.0%
Mean-73.764712
Minimum-201.35999
Maximum0
Zeros3802
Zeros (%)0.2%
Negative1721400
Negative (%)88.3%
Memory size14.9 MiB
2022-12-05T16:54:28.546347image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum-201.35999
5-th percentile-74.03485
Q1-73.97504
median-73.92747
Q3-73.866595
95-th percentile-73.76325
Maximum0
Range201.35999
Interquartile range (IQR)0.1084448

Descriptive statistics

Standard deviation3.6108736
Coefficient of variation (CV)-0.048951232
Kurtosis476.40518
Mean-73.764712
Median Absolute Deviation (MAD)0.052816
Skewness16.098825
Sum-1.2725903 × 108
Variance13.038408
MonotonicityNot monotonic
2022-12-05T16:54:29.057433image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 3802
 
0.2%
-73.91282 717
 
< 0.1%
-73.98453 697
 
< 0.1%
-73.89063 689
 
< 0.1%
-74.038086 672
 
< 0.1%
-73.91243 651
 
< 0.1%
-73.89686 601
 
< 0.1%
-73.9845292 587
 
< 0.1%
-73.882744 558
 
< 0.1%
-73.89083 539
 
< 0.1%
Other values (97160) 1715689
88.0%
(Missing) 224428
 
11.5%
ValueCountFrequency (%)
-201.35999 1
 
< 0.1%
-201.23706 105
< 0.1%
-89.13527 1
 
< 0.1%
-86.76847 1
 
< 0.1%
-79.61955 1
 
< 0.1%
-79.00183 1
 
< 0.1%
-76.2634 1
 
< 0.1%
-76.02163 1
 
< 0.1%
-74.742 7
 
< 0.1%
-74.25496 1
 
< 0.1%
ValueCountFrequency (%)
0 3802
0.2%
-32.768513 16
 
< 0.1%
-47.209625 3
 
< 0.1%
-73.66301 1
 
< 0.1%
-73.70055 2
 
< 0.1%
-73.700584 11
 
< 0.1%
-73.7005968 10
 
< 0.1%
-73.70061 1
 
< 0.1%
-73.70071 4
 
< 0.1%
-73.70073 1
 
< 0.1%

LOCATION
Categorical

HIGH CARDINALITY
MISSING

Distinct262859
Distinct (%)15.2%
Missing224428
Missing (%)11.5%
Memory size14.9 MiB
(0.0, 0.0)
 
3802
(40.861862, -73.91282)
 
685
(40.608757, -74.038086)
 
670
(40.696033, -73.98453)
 
646
(40.8047, -73.91243)
 
597
Other values (262854)
1718802 

Length

Max length25
Median length24
Mean length22.85448
Min length10

Characters and Unicode

Total characters39428594
Distinct characters16
Distinct categories6 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique142573 ?
Unique (%)8.3%

Sample

1st row(40.667202, -73.8665)
2nd row(40.683304, -73.917274)
3rd row(40.709183, -73.956825)
4th row(40.86816, -73.83148)
5th row(40.67172, -73.8971)

Common Values

ValueCountFrequency (%)
(0.0, 0.0) 3802
 
0.2%
(40.861862, -73.91282) 685
 
< 0.1%
(40.608757, -74.038086) 670
 
< 0.1%
(40.696033, -73.98453) 646
 
< 0.1%
(40.8047, -73.91243) 597
 
< 0.1%
(40.6960346, -73.9845292) 587
 
< 0.1%
(40.675735, -73.89686) 504
 
< 0.1%
(40.7606005, -73.9643142) 474
 
< 0.1%
(40.820305, -73.89083) 467
 
< 0.1%
(40.798256, -73.82744) 462
 
< 0.1%
Other values (262849) 1716308
88.0%
(Missing) 224428
 
11.5%

Length

2022-12-05T16:54:29.586788image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.0 7604
 
0.2%
40.861862 798
 
< 0.1%
73.91282 717
 
< 0.1%
40.696033 711
 
< 0.1%
73.98453 697
 
< 0.1%
40.8047 691
 
< 0.1%
73.89063 689
 
< 0.1%
74.038086 672
 
< 0.1%
40.608757 670
 
< 0.1%
73.91243 651
 
< 0.1%
Other values (221738) 3436504
99.6%

Most occurring characters

ValueCountFrequency (%)
7 4323009
11.0%
4 3738369
 
9.5%
. 3450404
 
8.8%
3 3291405
 
8.3%
0 3196746
 
8.1%
9 2545978
 
6.5%
8 2492909
 
6.3%
6 2459003
 
6.2%
5 1969952
 
5.0%
( 1725202
 
4.4%
Other values (6) 10235617
26.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 27355982
69.4%
Other Punctuation 5175606
 
13.1%
Open Punctuation 1725202
 
4.4%
Space Separator 1725202
 
4.4%
Close Punctuation 1725202
 
4.4%
Dash Punctuation 1721400
 
4.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 4323009
15.8%
4 3738369
13.7%
3 3291405
12.0%
0 3196746
11.7%
9 2545978
9.3%
8 2492909
9.1%
6 2459003
9.0%
5 1969952
7.2%
2 1685841
 
6.2%
1 1652770
 
6.0%
Other Punctuation
ValueCountFrequency (%)
. 3450404
66.7%
, 1725202
33.3%
Open Punctuation
ValueCountFrequency (%)
( 1725202
100.0%
Space Separator
ValueCountFrequency (%)
1725202
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1725202
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1721400
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 39428594
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
7 4323009
11.0%
4 3738369
 
9.5%
. 3450404
 
8.8%
3 3291405
 
8.3%
0 3196746
 
8.1%
9 2545978
 
6.5%
8 2492909
 
6.3%
6 2459003
 
6.2%
5 1969952
 
5.0%
( 1725202
 
4.4%
Other values (6) 10235617
26.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 39428594
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7 4323009
11.0%
4 3738369
 
9.5%
. 3450404
 
8.8%
3 3291405
 
8.3%
0 3196746
 
8.1%
9 2545978
 
6.5%
8 2492909
 
6.3%
6 2459003
 
6.2%
5 1969952
 
5.0%
( 1725202
 
4.4%
Other values (6) 10235617
26.0%

ON STREET NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct17394
Distinct (%)1.1%
Missing405870
Missing (%)20.8%
Memory size14.9 MiB
BROADWAY
 
17277
ATLANTIC AVENUE
 
15322
BELT PARKWAY
 
13543
3 AVENUE
 
12476
NORTHERN BOULEVARD
 
11965
Other values (17389)
1473177 

Length

Max length32
Median length32
Mean length30.523924
Min length2

Characters and Unicode

Total characters47121613
Distinct characters75
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6127 ?
Unique (%)0.4%

Sample

1st rowWHITESTONE EXPRESSWAY
2nd rowQUEENSBORO BRIDGE UPPER
3rd rowTHROGS NECK BRIDGE
4th rowSARATOGA AVENUE
5th rowMAJOR DEEGAN EXPRESSWAY RAMP

Common Values

ValueCountFrequency (%)
BROADWAY 17277
 
0.9%
ATLANTIC AVENUE 15322
 
0.8%
BELT PARKWAY 13543
 
0.7%
3 AVENUE 12476
 
0.6%
NORTHERN BOULEVARD 11965
 
0.6%
LONG ISLAND EXPRESSWAY 9928
 
0.5%
BROOKLYN QUEENS EXPRESSWAY 9743
 
0.5%
FLATBUSH AVENUE 9741
 
0.5%
LINDEN BOULEVARD 9587
 
0.5%
QUEENS BOULEVARD 9368
 
0.5%
Other values (17384) 1424810
73.1%
(Missing) 405870
 
20.8%

Length

2022-12-05T16:54:30.061675image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
avenue 575907
 
16.2%
street 495010
 
13.9%
east 146330
 
4.1%
boulevard 120725
 
3.4%
west 109261
 
3.1%
parkway 68464
 
1.9%
road 64584
 
1.8%
expressway 57674
 
1.6%
island 27790
 
0.8%
queens 25364
 
0.7%
Other values (5330) 1869278
52.5%

Most occurring characters

ValueCountFrequency (%)
27442233
58.2%
E 3471281
 
7.4%
A 1837349
 
3.9%
T 1737082
 
3.7%
R 1570485
 
3.3%
N 1344582
 
2.9%
S 1327418
 
2.8%
U 924511
 
2.0%
O 818479
 
1.7%
V 805864
 
1.7%
Other values (65) 5842329
 
12.4%

Most occurring categories

ValueCountFrequency (%)
Space Separator 27442233
58.2%
Uppercase Letter 18442420
39.1%
Decimal Number 1115475
 
2.4%
Lowercase Letter 111374
 
0.2%
Other Punctuation 4167
 
< 0.1%
Open Punctuation 2888
 
< 0.1%
Close Punctuation 2884
 
< 0.1%
Dash Punctuation 170
 
< 0.1%
Control 1
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 3471281
18.8%
A 1837349
10.0%
T 1737082
9.4%
R 1570485
 
8.5%
N 1344582
 
7.3%
S 1327418
 
7.2%
U 924511
 
5.0%
O 818479
 
4.4%
V 805864
 
4.4%
L 604948
 
3.3%
Other values (16) 4000421
21.7%
Lowercase Letter
ValueCountFrequency (%)
e 14784
13.3%
r 9802
 
8.8%
n 9434
 
8.5%
a 9201
 
8.3%
t 8067
 
7.2%
s 6849
 
6.1%
o 6586
 
5.9%
y 5582
 
5.0%
l 5182
 
4.7%
d 4295
 
3.9%
Other values (16) 31592
28.4%
Decimal Number
ValueCountFrequency (%)
1 253017
22.7%
3 126459
11.3%
2 124841
11.2%
4 105856
9.5%
5 103760
9.3%
6 90709
 
8.1%
8 83796
 
7.5%
7 82447
 
7.4%
9 73625
 
6.6%
0 70965
 
6.4%
Other Punctuation
ValueCountFrequency (%)
. 3062
73.5%
/ 970
 
23.3%
& 61
 
1.5%
' 36
 
0.9%
, 16
 
0.4%
# 16
 
0.4%
@ 6
 
0.1%
Space Separator
ValueCountFrequency (%)
27442233
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2888
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2884
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 170
100.0%
Control
ValueCountFrequency (%)
 1
100.0%
Math Symbol
ValueCountFrequency (%)
> 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28567819
60.6%
Latin 18553794
39.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 3471281
18.7%
A 1837349
9.9%
T 1737082
9.4%
R 1570485
 
8.5%
N 1344582
 
7.2%
S 1327418
 
7.2%
U 924511
 
5.0%
O 818479
 
4.4%
V 805864
 
4.3%
L 604948
 
3.3%
Other values (42) 4111795
22.2%
Common
ValueCountFrequency (%)
27442233
96.1%
1 253017
 
0.9%
3 126459
 
0.4%
2 124841
 
0.4%
4 105856
 
0.4%
5 103760
 
0.4%
6 90709
 
0.3%
8 83796
 
0.3%
7 82447
 
0.3%
9 73625
 
0.3%
Other values (13) 81076
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 47121613
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
27442233
58.2%
E 3471281
 
7.4%
A 1837349
 
3.9%
T 1737082
 
3.7%
R 1570485
 
3.3%
N 1344582
 
2.9%
S 1327418
 
2.8%
U 924511
 
2.0%
O 818479
 
1.7%
V 805864
 
1.7%
Other values (65) 5842329
 
12.4%

CROSS STREET NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct19731
Distinct (%)1.6%
Missing719709
Missing (%)36.9%
Memory size14.9 MiB
3 AVENUE
 
9843
BROADWAY
 
9685
2 AVENUE
 
8421
5 AVENUE
 
7051
7 AVENUE
 
6634
Other values (19726)
1188287 

Length

Max length32
Median length32
Mean length23.181595
Min length1

Characters and Unicode

Total characters28511530
Distinct characters76
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5958 ?
Unique (%)0.5%

Sample

1st row20 AVENUE
2nd rowDECATUR STREET
3rd rowEAST 43 STREET
4th rowEAST GATE PLAZA
5th rowwest 80 street -west 81 street

Common Values

ValueCountFrequency (%)
3 AVENUE 9843
 
0.5%
BROADWAY 9685
 
0.5%
2 AVENUE 8421
 
0.4%
5 AVENUE 7051
 
0.4%
7 AVENUE 6634
 
0.3%
8 AVENUE 6580
 
0.3%
3 AVENUE 6126
 
0.3%
BROADWAY 5680
 
0.3%
1 AVENUE 5318
 
0.3%
PARK AVENUE 4847
 
0.2%
Other values (19721) 1159736
59.5%
(Missing) 719709
36.9%

Length

2022-12-05T16:54:31.902573image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
avenue 538295
 
19.8%
street 438533
 
16.1%
east 107028
 
3.9%
west 68702
 
2.5%
boulevard 65125
 
2.4%
road 52885
 
1.9%
place 32356
 
1.2%
parkway 25218
 
0.9%
3 18036
 
0.7%
park 16679
 
0.6%
Other values (5427) 1357973
49.9%

Most occurring characters

ValueCountFrequency (%)
14042126
49.3%
E 2798427
 
9.8%
T 1386719
 
4.9%
A 1350774
 
4.7%
R 1092357
 
3.8%
N 1022407
 
3.6%
S 943233
 
3.3%
U 739542
 
2.6%
V 674549
 
2.4%
O 549889
 
1.9%
Other values (66) 3911507
 
13.7%

Most occurring categories

ValueCountFrequency (%)
Space Separator 14042126
49.3%
Uppercase Letter 13389512
47.0%
Decimal Number 1022350
 
3.6%
Lowercase Letter 57207
 
0.2%
Other Punctuation 296
 
< 0.1%
Dash Punctuation 27
 
< 0.1%
Open Punctuation 3
 
< 0.1%
Close Punctuation 3
 
< 0.1%
Control 2
 
< 0.1%
Math Symbol 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 2798427
20.9%
T 1386719
10.4%
A 1350774
10.1%
R 1092357
 
8.2%
N 1022407
 
7.6%
S 943233
 
7.0%
U 739542
 
5.5%
V 674549
 
5.0%
O 549889
 
4.1%
L 415765
 
3.1%
Other values (16) 2415850
18.0%
Lowercase Letter
ValueCountFrequency (%)
e 10693
18.7%
t 5976
10.4%
a 5625
9.8%
r 4693
 
8.2%
n 4052
 
7.1%
s 3760
 
6.6%
o 2730
 
4.8%
v 2670
 
4.7%
u 2344
 
4.1%
l 2046
 
3.6%
Other values (16) 12618
22.1%
Decimal Number
ValueCountFrequency (%)
1 226026
22.1%
2 120574
11.8%
3 112276
11.0%
4 92232
9.0%
5 92165
9.0%
8 81356
 
8.0%
7 81189
 
7.9%
6 80691
 
7.9%
9 70153
 
6.9%
0 65688
 
6.4%
Other Punctuation
ValueCountFrequency (%)
/ 124
41.9%
. 66
22.3%
' 51
17.2%
& 49
 
16.6%
? 3
 
1.0%
, 3
 
1.0%
Space Separator
ValueCountFrequency (%)
14042126
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Control
ValueCountFrequency (%)
 2
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15064811
52.8%
Latin 13446719
47.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 2798427
20.8%
T 1386719
10.3%
A 1350774
10.0%
R 1092357
 
8.1%
N 1022407
 
7.6%
S 943233
 
7.0%
U 739542
 
5.5%
V 674549
 
5.0%
O 549889
 
4.1%
L 415765
 
3.1%
Other values (42) 2473057
18.4%
Common
ValueCountFrequency (%)
14042126
93.2%
1 226026
 
1.5%
2 120574
 
0.8%
3 112276
 
0.7%
4 92232
 
0.6%
5 92165
 
0.6%
8 81356
 
0.5%
7 81189
 
0.5%
6 80691
 
0.5%
9 70153
 
0.5%
Other values (14) 66023
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28511529
> 99.9%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14042126
49.3%
E 2798427
 
9.8%
T 1386719
 
4.9%
A 1350774
 
4.7%
R 1092357
 
3.8%
N 1022407
 
3.6%
S 943233
 
3.3%
U 739542
 
2.6%
V 674549
 
2.4%
O 549889
 
1.9%
Other values (65) 3911506
 
13.7%
Specials
ValueCountFrequency (%)
1
100.0%

OFF STREET NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct202042
Distinct (%)64.4%
Missing1636131
Missing (%)83.9%
Memory size14.9 MiB
772 EDGEWATER ROAD
 
402
110-00 ROCKAWAY BOULEVARD
 
261
2800 VICTORY BOULEVARD
 
236
2655 RICHMOND AVENUE
 
169
2100 BARTOW AVENUE
 
167
Other values (202037)
312264 

Length

Max length40
Median length40
Mean length37.427258
Min length8

Characters and Unicode

Total characters11733408
Distinct characters84
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique157896 ?
Unique (%)50.4%

Sample

1st row1211 LORING AVENUE
2nd row344 BAYCHESTER AVENUE
3rd row2047 PITKIN AVENUE
4th row480 DEAN STREET
5th row878 FLATBUSH AVENUE

Common Values

ValueCountFrequency (%)
772 EDGEWATER ROAD 402
 
< 0.1%
110-00 ROCKAWAY BOULEVARD 261
 
< 0.1%
2800 VICTORY BOULEVARD 236
 
< 0.1%
2655 RICHMOND AVENUE 169
 
< 0.1%
2100 BARTOW AVENUE 167
 
< 0.1%
501 GATEWAY DRIVE 164
 
< 0.1%
PARKING LOT 110-00 ROCKAWAY BOULEVARD 150
 
< 0.1%
625 ATLANTIC AVENUE 145
 
< 0.1%
450 FLATBUSH AVENUE 145
 
< 0.1%
3 AVENUE 142
 
< 0.1%
Other values (202032) 311518
 
16.0%
(Missing) 1636131
83.9%

Length

2022-12-05T16:54:32.589847image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
avenue 124198
 
11.9%
street 112028
 
10.7%
east 29581
 
2.8%
west 21488
 
2.1%
boulevard 20193
 
1.9%
road 14814
 
1.4%
lot 7881
 
0.8%
parking 7267
 
0.7%
of 6872
 
0.7%
parkway 6198
 
0.6%
Other values (26931) 695441
66.5%

Most occurring characters

ValueCountFrequency (%)
6606625
56.3%
E 716201
 
6.1%
T 391510
 
3.3%
A 370198
 
3.2%
R 306759
 
2.6%
N 270609
 
2.3%
S 256593
 
2.2%
1 248197
 
2.1%
U 183288
 
1.6%
O 173041
 
1.5%
Other values (74) 2210387
 
18.8%

Most occurring categories

ValueCountFrequency (%)
Space Separator 6606625
56.3%
Uppercase Letter 3715937
31.7%
Decimal Number 1300144
 
11.1%
Dash Punctuation 74130
 
0.6%
Lowercase Letter 22371
 
0.2%
Other Punctuation 9568
 
0.1%
Open Punctuation 2311
 
< 0.1%
Close Punctuation 2300
 
< 0.1%
Modifier Symbol 16
 
< 0.1%
Connector Punctuation 3
 
< 0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 716201
19.3%
T 391510
10.5%
A 370198
10.0%
R 306759
8.3%
N 270609
 
7.3%
S 256593
 
6.9%
U 183288
 
4.9%
O 173041
 
4.7%
V 171574
 
4.6%
L 130718
 
3.5%
Other values (16) 745446
20.1%
Lowercase Letter
ValueCountFrequency (%)
e 3738
16.7%
t 2639
11.8%
r 2068
9.2%
a 1942
 
8.7%
n 1479
 
6.6%
s 1457
 
6.5%
o 1175
 
5.3%
v 979
 
4.4%
d 903
 
4.0%
l 883
 
3.9%
Other values (16) 5108
22.8%
Other Punctuation
ValueCountFrequency (%)
/ 6426
67.2%
& 1740
 
18.2%
. 997
 
10.4%
@ 145
 
1.5%
, 82
 
0.9%
: 59
 
0.6%
# 54
 
0.6%
' 50
 
0.5%
* 8
 
0.1%
? 3
 
< 0.1%
Other values (2) 4
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 248197
19.1%
2 168844
13.0%
0 146872
11.3%
3 132733
10.2%
5 131574
10.1%
4 116028
8.9%
6 94784
 
7.3%
7 92718
 
7.1%
8 87497
 
6.7%
9 80897
 
6.2%
Close Punctuation
ValueCountFrequency (%)
) 2299
> 99.9%
] 1
 
< 0.1%
Control
ValueCountFrequency (%)
1
50.0%
 1
50.0%
Space Separator
ValueCountFrequency (%)
6606625
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 74130
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2311
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 16
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Math Symbol
ValueCountFrequency (%)
= 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 7995100
68.1%
Latin 3738308
31.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 716201
19.2%
T 391510
10.5%
A 370198
9.9%
R 306759
 
8.2%
N 270609
 
7.2%
S 256593
 
6.9%
U 183288
 
4.9%
O 173041
 
4.6%
V 171574
 
4.6%
L 130718
 
3.5%
Other values (42) 767817
20.5%
Common
ValueCountFrequency (%)
6606625
82.6%
1 248197
 
3.1%
2 168844
 
2.1%
0 146872
 
1.8%
3 132733
 
1.7%
5 131574
 
1.6%
4 116028
 
1.5%
6 94784
 
1.2%
7 92718
 
1.2%
8 87497
 
1.1%
Other values (22) 169228
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11733408
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6606625
56.3%
E 716201
 
6.1%
T 391510
 
3.3%
A 370198
 
3.2%
R 306759
 
2.6%
N 270609
 
2.3%
S 256593
 
2.2%
1 248197
 
2.1%
U 183288
 
1.6%
O 173041
 
1.5%
Other values (74) 2210387
 
18.8%

NUMBER OF PERSONS INJURED
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct28
Distinct (%)< 0.1%
Missing18
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.29388617
Minimum0
Maximum43
Zeros1527469
Zeros (%)78.3%
Negative0
Negative (%)0.0%
Memory size14.9 MiB
2022-12-05T16:54:33.144742image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6854355
Coefficient of variation (CV)2.3323163
Kurtosis51.490826
Mean0.29388617
Median Absolute Deviation (MAD)0
Skewness4.3243749
Sum572964
Variance0.46982183
MonotonicityNot monotonic
2022-12-05T16:54:33.620238image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
0 1527469
78.3%
1 327457
 
16.8%
2 61714
 
3.2%
3 20165
 
1.0%
4 7568
 
0.4%
5 2954
 
0.2%
6 1200
 
0.1%
7 528
 
< 0.1%
8 218
 
< 0.1%
9 117
 
< 0.1%
Other values (18) 222
 
< 0.1%
ValueCountFrequency (%)
0 1527469
78.3%
1 327457
 
16.8%
2 61714
 
3.2%
3 20165
 
1.0%
4 7568
 
0.4%
5 2954
 
0.2%
6 1200
 
0.1%
7 528
 
< 0.1%
8 218
 
< 0.1%
9 117
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
40 1
 
< 0.1%
32 1
 
< 0.1%
31 1
 
< 0.1%
27 1
 
< 0.1%
24 3
< 0.1%
22 3
< 0.1%
20 2
 
< 0.1%
19 4
< 0.1%
18 5
< 0.1%

NUMBER OF PERSONS KILLED
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing31
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.001400288
Minimum0
Maximum8
Zeros1946976
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size14.9 MiB
2022-12-05T16:54:34.285591image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.039450084
Coefficient of variation (CV)28.172837
Kurtosis2101.6931
Mean0.001400288
Median Absolute Deviation (MAD)0
Skewness34.881399
Sum2730
Variance0.0015563092
MonotonicityNot monotonic
2022-12-05T16:54:34.950028image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 1946976
99.9%
1 2542
 
0.1%
2 65
 
< 0.1%
3 11
 
< 0.1%
4 3
 
< 0.1%
8 1
 
< 0.1%
5 1
 
< 0.1%
(Missing) 31
 
< 0.1%
ValueCountFrequency (%)
0 1946976
99.9%
1 2542
 
0.1%
2 65
 
< 0.1%
3 11
 
< 0.1%
4 3
 
< 0.1%
5 1
 
< 0.1%
8 1
 
< 0.1%
ValueCountFrequency (%)
8 1
 
< 0.1%
5 1
 
< 0.1%
4 3
 
< 0.1%
3 11
 
< 0.1%
2 65
 
< 0.1%
1 2542
 
0.1%
0 1946976
99.9%

NUMBER OF PEDESTRIANS INJURED
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05384355
Minimum0
Maximum27
Zeros1848975
Zeros (%)94.8%
Negative0
Negative (%)0.0%
Memory size14.9 MiB
2022-12-05T16:54:35.599288image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum27
Range27
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.23824339
Coefficient of variation (CV)4.4247341
Kurtosis127.42717
Mean0.05384355
Median Absolute Deviation (MAD)0
Skewness5.7195638
Sum104975
Variance0.056759913
MonotonicityNot monotonic
2022-12-05T16:54:36.050674image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
0 1848975
94.8%
1 96974
 
5.0%
2 3251
 
0.2%
3 332
 
< 0.1%
4 55
 
< 0.1%
5 23
 
< 0.1%
6 11
 
< 0.1%
7 3
 
< 0.1%
9 2
 
< 0.1%
27 1
 
< 0.1%
Other values (3) 3
 
< 0.1%
ValueCountFrequency (%)
0 1848975
94.8%
1 96974
 
5.0%
2 3251
 
0.2%
3 332
 
< 0.1%
4 55
 
< 0.1%
5 23
 
< 0.1%
6 11
 
< 0.1%
7 3
 
< 0.1%
8 1
 
< 0.1%
9 2
 
< 0.1%
ValueCountFrequency (%)
27 1
 
< 0.1%
15 1
 
< 0.1%
13 1
 
< 0.1%
9 2
 
< 0.1%
8 1
 
< 0.1%
7 3
 
< 0.1%
6 11
 
< 0.1%
5 23
 
< 0.1%
4 55
 
< 0.1%
3 332
< 0.1%
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.9 MiB
0
1948258 
1
 
1359
2
 
12
6
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1949630
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1948258
99.9%
1 1359
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Length

2022-12-05T16:54:36.611135image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-05T16:54:37.062741image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
0 1948258
99.9%
1 1359
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1948258
99.9%
1 1359
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1949630
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1948258
99.9%
1 1359
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1949630
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1948258
99.9%
1 1359
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1949630
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1948258
99.9%
1 1359
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.9 MiB
0
1900803 
1
 
48289
2
 
516
3
 
21
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1949630
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1900803
97.5%
1 48289
 
2.5%
2 516
 
< 0.1%
3 21
 
< 0.1%
4 1
 
< 0.1%

Length

2022-12-05T16:54:37.433982image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-05T16:54:37.928549image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
0 1900803
97.5%
1 48289
 
2.5%
2 516
 
< 0.1%
3 21
 
< 0.1%
4 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1900803
97.5%
1 48289
 
2.5%
2 516
 
< 0.1%
3 21
 
< 0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1949630
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1900803
97.5%
1 48289
 
2.5%
2 516
 
< 0.1%
3 21
 
< 0.1%
4 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1949630
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1900803
97.5%
1 48289
 
2.5%
2 516
 
< 0.1%
3 21
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1949630
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1900803
97.5%
1 48289
 
2.5%
2 516
 
< 0.1%
3 21
 
< 0.1%
4 1
 
< 0.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.9 MiB
0
1949429 
1
 
200
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1949630
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1949429
> 99.9%
1 200
 
< 0.1%
2 1
 
< 0.1%

Length

2022-12-05T16:54:38.365917image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-05T16:54:38.783125image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
0 1949429
> 99.9%
1 200
 
< 0.1%
2 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1949429
> 99.9%
1 200
 
< 0.1%
2 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1949630
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1949429
> 99.9%
1 200
 
< 0.1%
2 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1949630
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1949429
> 99.9%
1 200
 
< 0.1%
2 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1949630
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1949429
> 99.9%
1 200
 
< 0.1%
2 1
 
< 0.1%

NUMBER OF MOTORIST INJURED
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.21231875
Minimum0
Maximum43
Zeros1678914
Zeros (%)86.1%
Negative0
Negative (%)0.0%
Memory size14.9 MiB
2022-12-05T16:54:39.101555image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.64677755
Coefficient of variation (CV)3.0462574
Kurtosis64.177399
Mean0.21231875
Median Absolute Deviation (MAD)0
Skewness5.2008158
Sum413943
Variance0.4183212
MonotonicityNot monotonic
2022-12-05T16:54:39.430729image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
0 1678914
86.1%
1 182107
 
9.3%
2 56524
 
2.9%
3 19562
 
1.0%
4 7416
 
0.4%
5 2907
 
0.1%
6 1159
 
0.1%
7 504
 
< 0.1%
8 210
 
< 0.1%
9 113
 
< 0.1%
Other values (18) 214
 
< 0.1%
ValueCountFrequency (%)
0 1678914
86.1%
1 182107
 
9.3%
2 56524
 
2.9%
3 19562
 
1.0%
4 7416
 
0.4%
5 2907
 
0.1%
6 1159
 
0.1%
7 504
 
< 0.1%
8 210
 
< 0.1%
9 113
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
40 1
 
< 0.1%
31 1
 
< 0.1%
30 1
 
< 0.1%
24 3
< 0.1%
22 2
 
< 0.1%
21 1
 
< 0.1%
20 2
 
< 0.1%
19 3
< 0.1%
18 5
< 0.1%

NUMBER OF MOTORIST KILLED
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.00056728713
Minimum0
Maximum5
Zeros1948606
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size14.9 MiB
2022-12-05T16:54:39.726026image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.025974572
Coefficient of variation (CV)45.787347
Kurtosis4260.525
Mean0.00056728713
Median Absolute Deviation (MAD)0
Skewness55.837664
Sum1106
Variance0.0006746784
MonotonicityNot monotonic
2022-12-05T16:54:40.083348image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 1948606
99.9%
1 960
 
< 0.1%
2 50
 
< 0.1%
3 11
 
< 0.1%
4 2
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
0 1948606
99.9%
1 960
 
< 0.1%
2 50
 
< 0.1%
3 11
 
< 0.1%
4 2
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
5 1
 
< 0.1%
4 2
 
< 0.1%
3 11
 
< 0.1%
2 50
 
< 0.1%
1 960
 
< 0.1%
0 1948606
99.9%

CONTRIBUTING FACTOR VEHICLE 1
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct61
Distinct (%)< 0.1%
Missing5919
Missing (%)0.3%
Memory size14.9 MiB
Unspecified
676629 
Driver Inattention/Distraction
384021 
Failure to Yield Right-of-Way
114498 
Following Too Closely
103076 
Backing Unsafely
72926 
Other values (56)
592561 

Length

Max length53
Median length43
Mean length19.383688
Min length1

Characters and Unicode

Total characters37676287
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAggressive Driving/Road Rage
2nd rowPavement Slippery
3rd rowFollowing Too Closely
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 676629
34.7%
Driver Inattention/Distraction 384021
19.7%
Failure to Yield Right-of-Way 114498
 
5.9%
Following Too Closely 103076
 
5.3%
Backing Unsafely 72926
 
3.7%
Other Vehicular 60620
 
3.1%
Passing or Lane Usage Improper 52345
 
2.7%
Turning Improperly 48314
 
2.5%
Passing Too Closely 47509
 
2.4%
Fatigued/Drowsy 47273
 
2.4%
Other values (51) 336500
17.3%

Length

2022-12-05T16:54:40.502281image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 676629
17.6%
driver 413817
 
10.8%
inattention/distraction 384021
 
10.0%
too 150585
 
3.9%
closely 150585
 
3.9%
to 137701
 
3.6%
failure 120287
 
3.1%
yield 114498
 
3.0%
right-of-way 114498
 
3.0%
following 103076
 
2.7%
Other values (96) 1478248
38.5%

Most occurring characters

ValueCountFrequency (%)
i 4261437
 
11.3%
e 3847900
 
10.2%
n 3271990
 
8.7%
t 2597964
 
6.9%
o 2207352
 
5.9%
r 2192939
 
5.8%
s 1969901
 
5.2%
1900234
 
5.0%
a 1846471
 
4.9%
c 1467914
 
3.9%
Other values (45) 12112185
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 30798764
81.7%
Uppercase Letter 4255134
 
11.3%
Space Separator 1900234
 
5.0%
Other Punctuation 487265
 
1.3%
Dash Punctuation 230638
 
0.6%
Open Punctuation 2020
 
< 0.1%
Close Punctuation 2020
 
< 0.1%
Decimal Number 212
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 4261437
13.8%
e 3847900
12.5%
n 3271990
10.6%
t 2597964
8.4%
o 2207352
 
7.2%
r 2192939
 
7.1%
s 1969901
 
6.4%
a 1846471
 
6.0%
c 1467914
 
4.8%
l 1159451
 
3.8%
Other values (15) 5975445
19.4%
Uppercase Letter
ValueCountFrequency (%)
D 936708
22.0%
U 884618
20.8%
I 544123
12.8%
F 278089
 
6.5%
C 264871
 
6.2%
T 235626
 
5.5%
P 171170
 
4.0%
R 156228
 
3.7%
L 124678
 
2.9%
W 115534
 
2.7%
Other values (12) 543489
12.8%
Decimal Number
ValueCountFrequency (%)
8 101
47.6%
0 101
47.6%
1 10
 
4.7%
Space Separator
ValueCountFrequency (%)
1900234
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 487265
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 230638
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2020
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2020
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 35053898
93.0%
Common 2622389
 
7.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 4261437
12.2%
e 3847900
 
11.0%
n 3271990
 
9.3%
t 2597964
 
7.4%
o 2207352
 
6.3%
r 2192939
 
6.3%
s 1969901
 
5.6%
a 1846471
 
5.3%
c 1467914
 
4.2%
l 1159451
 
3.3%
Other values (37) 10230579
29.2%
Common
ValueCountFrequency (%)
1900234
72.5%
/ 487265
 
18.6%
- 230638
 
8.8%
( 2020
 
0.1%
) 2020
 
0.1%
8 101
 
< 0.1%
0 101
 
< 0.1%
1 10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 37676287
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 4261437
 
11.3%
e 3847900
 
10.2%
n 3271990
 
8.7%
t 2597964
 
6.9%
o 2207352
 
5.9%
r 2192939
 
5.8%
s 1969901
 
5.2%
1900234
 
5.0%
a 1846471
 
4.9%
c 1467914
 
3.9%
Other values (45) 12112185
32.1%

CONTRIBUTING FACTOR VEHICLE 2
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct61
Distinct (%)< 0.1%
Missing291964
Missing (%)15.0%
Memory size14.9 MiB
Unspecified
1395623 
Driver Inattention/Distraction
 
88639
Other Vehicular
 
30554
Following Too Closely
 
17526
Failure to Yield Right-of-Way
 
16327
Other values (56)
 
108997

Length

Max length53
Median length11
Mean length13.038235
Min length1

Characters and Unicode

Total characters21613039
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 1395623
71.6%
Driver Inattention/Distraction 88639
 
4.5%
Other Vehicular 30554
 
1.6%
Following Too Closely 17526
 
0.9%
Failure to Yield Right-of-Way 16327
 
0.8%
Passing or Lane Usage Improper 11896
 
0.6%
Fatigued/Drowsy 10833
 
0.6%
Turning Improperly 8458
 
0.4%
Passing Too Closely 8177
 
0.4%
Backing Unsafely 7679
 
0.4%
Other values (51) 61954
 
3.2%
(Missing) 291964
 
15.0%

Length

2022-12-05T16:54:40.980823image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 1395623
68.7%
driver 94955
 
4.7%
inattention/distraction 88639
 
4.4%
other 31611
 
1.6%
vehicular 30554
 
1.5%
too 25703
 
1.3%
closely 25703
 
1.3%
to 20497
 
1.0%
passing 20073
 
1.0%
lane 18795
 
0.9%
Other values (96) 279382
 
13.8%

Most occurring characters

ValueCountFrequency (%)
i 3409113
15.8%
e 3316582
15.3%
n 1936211
9.0%
s 1661264
7.7%
c 1575269
7.3%
d 1464569
6.8%
p 1461288
6.8%
f 1447832
6.7%
U 1429652
6.6%
t 584113
 
2.7%
Other values (45) 3327146
15.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 18965693
87.8%
Uppercase Letter 2127710
 
9.8%
Space Separator 373869
 
1.7%
Other Punctuation 111920
 
0.5%
Dash Punctuation 33242
 
0.2%
Open Punctuation 278
 
< 0.1%
Close Punctuation 278
 
< 0.1%
Decimal Number 49
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 3409113
18.0%
e 3316582
17.5%
n 1936211
10.2%
s 1661264
8.8%
c 1575269
8.3%
d 1464569
7.7%
p 1461288
7.7%
f 1447832
7.6%
t 584113
 
3.1%
r 508927
 
2.7%
Other values (15) 1600525
8.4%
Uppercase Letter
ValueCountFrequency (%)
U 1429652
67.2%
D 211693
 
9.9%
I 119019
 
5.6%
C 49185
 
2.3%
F 46042
 
2.2%
O 41973
 
2.0%
T 41533
 
2.0%
V 39291
 
1.8%
P 35055
 
1.6%
L 27010
 
1.3%
Other values (12) 87257
 
4.1%
Decimal Number
ValueCountFrequency (%)
8 22
44.9%
0 22
44.9%
1 5
 
10.2%
Space Separator
ValueCountFrequency (%)
373869
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 111920
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 33242
100.0%
Open Punctuation
ValueCountFrequency (%)
( 278
100.0%
Close Punctuation
ValueCountFrequency (%)
) 278
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 21093403
97.6%
Common 519636
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 3409113
16.2%
e 3316582
15.7%
n 1936211
9.2%
s 1661264
7.9%
c 1575269
7.5%
d 1464569
6.9%
p 1461288
6.9%
f 1447832
6.9%
U 1429652
6.8%
t 584113
 
2.8%
Other values (37) 2807510
13.3%
Common
ValueCountFrequency (%)
373869
71.9%
/ 111920
 
21.5%
- 33242
 
6.4%
( 278
 
0.1%
) 278
 
0.1%
8 22
 
< 0.1%
0 22
 
< 0.1%
1 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21613039
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 3409113
15.8%
e 3316582
15.3%
n 1936211
9.0%
s 1661264
7.7%
c 1575269
7.3%
d 1464569
6.8%
p 1461288
6.8%
f 1447832
6.7%
U 1429652
6.6%
t 584113
 
2.7%
Other values (45) 3327146
15.4%

CONTRIBUTING FACTOR VEHICLE 3
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct51
Distinct (%)< 0.1%
Missing1812944
Missing (%)93.0%
Memory size14.9 MiB
Unspecified
127424 
Other Vehicular
 
2534
Driver Inattention/Distraction
 
1804
Following Too Closely
 
1746
Fatigued/Drowsy
 
853
Other values (46)
 
2325

Length

Max length53
Median length11
Mean length11.655334
Min length1

Characters and Unicode

Total characters1593121
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 127424
 
6.5%
Other Vehicular 2534
 
0.1%
Driver Inattention/Distraction 1804
 
0.1%
Following Too Closely 1746
 
0.1%
Fatigued/Drowsy 853
 
< 0.1%
Pavement Slippery 371
 
< 0.1%
Reaction to Uninvolved Vehicle 195
 
< 0.1%
Driver Inexperience 169
 
< 0.1%
Outside Car Distraction 159
 
< 0.1%
Traffic Control Disregarded 150
 
< 0.1%
Other values (41) 1281
 
0.1%
(Missing) 1812944
93.0%

Length

2022-12-05T16:54:41.434982image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 127424
85.9%
other 2574
 
1.7%
vehicular 2534
 
1.7%
driver 1973
 
1.3%
inattention/distraction 1804
 
1.2%
too 1792
 
1.2%
closely 1792
 
1.2%
following 1746
 
1.2%
fatigued/drowsy 853
 
0.6%
pavement 385
 
0.3%
Other values (79) 5456
 
3.7%

Most occurring characters

ValueCountFrequency (%)
e 272243
17.1%
i 271136
17.0%
n 139663
8.8%
s 133824
8.4%
c 133326
8.4%
d 129434
8.1%
p 128948
8.1%
f 128260
8.1%
U 128006
8.0%
o 15718
 
1.0%
Other values (45) 112563
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1427288
89.6%
Uppercase Letter 150937
 
9.5%
Space Separator 11647
 
0.7%
Other Punctuation 2921
 
0.2%
Dash Punctuation 297
 
< 0.1%
Open Punctuation 12
 
< 0.1%
Close Punctuation 12
 
< 0.1%
Decimal Number 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 272243
19.1%
i 271136
19.0%
n 139663
9.8%
s 133824
9.4%
c 133326
9.3%
d 129434
9.1%
p 128948
9.0%
f 128260
9.0%
o 15718
 
1.1%
t 14904
 
1.0%
Other values (15) 59832
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
U 128006
84.8%
D 5205
 
3.4%
O 2879
 
1.9%
F 2836
 
1.9%
V 2795
 
1.9%
I 2280
 
1.5%
C 2247
 
1.5%
T 2039
 
1.4%
P 647
 
0.4%
S 506
 
0.3%
Other values (12) 1497
 
1.0%
Decimal Number
ValueCountFrequency (%)
8 3
42.9%
0 3
42.9%
1 1
 
14.3%
Space Separator
ValueCountFrequency (%)
11647
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2921
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 297
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12
100.0%
Close Punctuation
ValueCountFrequency (%)
) 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1578225
99.1%
Common 14896
 
0.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 272243
17.2%
i 271136
17.2%
n 139663
8.8%
s 133824
8.5%
c 133326
8.4%
d 129434
8.2%
p 128948
8.2%
f 128260
8.1%
U 128006
8.1%
o 15718
 
1.0%
Other values (37) 97667
 
6.2%
Common
ValueCountFrequency (%)
11647
78.2%
/ 2921
 
19.6%
- 297
 
2.0%
( 12
 
0.1%
) 12
 
0.1%
8 3
 
< 0.1%
0 3
 
< 0.1%
1 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1593121
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 272243
17.1%
i 271136
17.0%
n 139663
8.8%
s 133824
8.4%
c 133326
8.4%
d 129434
8.1%
p 128948
8.1%
f 128260
8.1%
U 128006
8.0%
o 15718
 
1.0%
Other values (45) 112563
7.1%

CONTRIBUTING FACTOR VEHICLE 4
Categorical

HIGH CORRELATION
MISSING

Distinct40
Distinct (%)0.1%
Missing1919228
Missing (%)98.4%
Memory size14.9 MiB
Unspecified
28695 
Other Vehicular
 
542
Following Too Closely
 
340
Driver Inattention/Distraction
 
248
Fatigued/Drowsy
 
170
Other values (35)
 
407

Length

Max length43
Median length11
Mean length11.483784
Min length5

Characters and Unicode

Total characters349130
Distinct characters50
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 28695
 
1.5%
Other Vehicular 542
 
< 0.1%
Following Too Closely 340
 
< 0.1%
Driver Inattention/Distraction 248
 
< 0.1%
Fatigued/Drowsy 170
 
< 0.1%
Pavement Slippery 106
 
< 0.1%
Reaction to Uninvolved Vehicle 38
 
< 0.1%
Outside Car Distraction 27
 
< 0.1%
Unsafe Speed 26
 
< 0.1%
Driver Inexperience 24
 
< 0.1%
Other values (30) 186
 
< 0.1%
(Missing) 1919228
98.4%

Length

2022-12-05T16:54:41.861608image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 28695
88.3%
other 551
 
1.7%
vehicular 542
 
1.7%
too 345
 
1.1%
closely 345
 
1.1%
following 340
 
1.0%
driver 272
 
0.8%
inattention/distraction 248
 
0.8%
fatigued/drowsy 170
 
0.5%
pavement 109
 
0.3%
Other values (63) 871
 
2.7%

Most occurring characters

ValueCountFrequency (%)
e 60593
17.4%
i 60060
17.2%
n 30541
8.7%
c 29741
8.5%
s 29724
8.5%
d 29035
8.3%
p 29018
8.3%
f 28810
8.3%
U 28783
8.2%
o 2743
 
0.8%
Other values (40) 20082
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 313669
89.8%
Uppercase Letter 32883
 
9.4%
Space Separator 2086
 
0.6%
Other Punctuation 450
 
0.1%
Dash Punctuation 34
 
< 0.1%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 60593
19.3%
i 60060
19.1%
n 30541
9.7%
c 29741
9.5%
s 29724
9.5%
d 29035
9.3%
p 29018
9.3%
f 28810
9.2%
o 2743
 
0.9%
r 2490
 
0.8%
Other values (14) 10914
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
U 28783
87.5%
D 794
 
2.4%
O 599
 
1.8%
V 585
 
1.8%
F 554
 
1.7%
C 403
 
1.2%
T 374
 
1.1%
I 313
 
1.0%
S 133
 
0.4%
P 130
 
0.4%
Other values (11) 215
 
0.7%
Space Separator
ValueCountFrequency (%)
2086
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 450
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 34
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 346552
99.3%
Common 2578
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 60593
17.5%
i 60060
17.3%
n 30541
8.8%
c 29741
8.6%
s 29724
8.6%
d 29035
8.4%
p 29018
8.4%
f 28810
8.3%
U 28783
8.3%
o 2743
 
0.8%
Other values (35) 17504
 
5.1%
Common
ValueCountFrequency (%)
2086
80.9%
/ 450
 
17.5%
- 34
 
1.3%
( 4
 
0.2%
) 4
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 349130
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 60593
17.4%
i 60060
17.2%
n 30541
8.7%
c 29741
8.5%
s 29724
8.5%
d 29035
8.3%
p 29018
8.3%
f 28810
8.3%
U 28783
8.2%
o 2743
 
0.8%
Other values (40) 20082
 
5.8%

CONTRIBUTING FACTOR VEHICLE 5
Categorical

HIGH CORRELATION
MISSING

Distinct29
Distinct (%)0.4%
Missing1941480
Missing (%)99.6%
Memory size14.9 MiB
Unspecified
7686 
Other Vehicular
 
158
Following Too Closely
 
81
Driver Inattention/Distraction
 
60
Pavement Slippery
 
44
Other values (24)
 
121

Length

Max length43
Median length11
Mean length11.467362
Min length5

Characters and Unicode

Total characters93459
Distinct characters49
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 7686
 
0.4%
Other Vehicular 158
 
< 0.1%
Following Too Closely 81
 
< 0.1%
Driver Inattention/Distraction 60
 
< 0.1%
Pavement Slippery 44
 
< 0.1%
Fatigued/Drowsy 41
 
< 0.1%
Reaction to Uninvolved Vehicle 11
 
< 0.1%
Alcohol Involvement 10
 
< 0.1%
Driver Inexperience 9
 
< 0.1%
Unsafe Speed 7
 
< 0.1%
Other values (19) 43
 
< 0.1%
(Missing) 1941480
99.6%

Length

2022-12-05T16:54:42.520959image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 7686
88.3%
other 160
 
1.8%
vehicular 158
 
1.8%
too 83
 
1.0%
closely 83
 
1.0%
following 81
 
0.9%
driver 69
 
0.8%
inattention/distraction 60
 
0.7%
pavement 45
 
0.5%
slippery 44
 
0.5%
Other values (46) 232
 
2.7%

Most occurring characters

ValueCountFrequency (%)
e 16277
17.4%
i 16069
17.2%
n 8162
8.7%
c 7973
8.5%
s 7926
8.5%
p 7799
8.3%
d 7765
8.3%
f 7709
8.2%
U 7706
8.2%
o 677
 
0.7%
Other values (39) 5396
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 83985
89.9%
Uppercase Letter 8797
 
9.4%
Space Separator 551
 
0.6%
Other Punctuation 111
 
0.1%
Dash Punctuation 11
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 16277
19.4%
i 16069
19.1%
n 8162
9.7%
c 7973
9.5%
s 7926
9.4%
p 7799
9.3%
d 7765
9.2%
f 7709
9.2%
o 677
 
0.8%
r 677
 
0.8%
Other values (14) 2951
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
U 7706
87.6%
D 195
 
2.2%
O 174
 
2.0%
V 169
 
1.9%
F 134
 
1.5%
C 94
 
1.1%
T 88
 
1.0%
I 83
 
0.9%
S 52
 
0.6%
P 48
 
0.5%
Other values (10) 54
 
0.6%
Space Separator
ValueCountFrequency (%)
551
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 111
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 92782
99.3%
Common 677
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 16277
17.5%
i 16069
17.3%
n 8162
8.8%
c 7973
8.6%
s 7926
8.5%
p 7799
8.4%
d 7765
8.4%
f 7709
8.3%
U 7706
8.3%
o 677
 
0.7%
Other values (34) 4719
 
5.1%
Common
ValueCountFrequency (%)
551
81.4%
/ 111
 
16.4%
- 11
 
1.6%
( 2
 
0.3%
) 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 93459
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 16277
17.4%
i 16069
17.2%
n 8162
8.7%
c 7973
8.5%
s 7926
8.5%
p 7799
8.3%
d 7765
8.3%
f 7709
8.2%
U 7706
8.2%
o 677
 
0.7%
Other values (39) 5396
 
5.8%

COLLISION_ID
Real number (ℝ)

HIGH CORRELATION
UNIQUE

Distinct1949630
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3063521.4
Minimum22
Maximum4586417
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.9 MiB
2022-12-05T16:54:43.024837image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile98314.45
Q13122119.2
median3611018.5
Q34098681.8
95-th percentile4488695.5
Maximum4586417
Range4586395
Interquartile range (IQR)976562.5

Descriptive statistics

Standard deviation1503058.7
Coefficient of variation (CV)0.49063103
Kurtosis-0.2122234
Mean3063521.4
Median Absolute Deviation (MAD)488282
Skewness-1.1788488
Sum5.9727332 × 1012
Variance2.2591853 × 1012
MonotonicityNot monotonic
2022-12-05T16:54:43.435665image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4455765 1
 
< 0.1%
3269862 1
 
< 0.1%
3265648 1
 
< 0.1%
3276194 1
 
< 0.1%
3266120 1
 
< 0.1%
3274634 1
 
< 0.1%
3274205 1
 
< 0.1%
3267013 1
 
< 0.1%
3266847 1
 
< 0.1%
3273878 1
 
< 0.1%
Other values (1949620) 1949620
> 99.9%
ValueCountFrequency (%)
22 1
< 0.1%
23 1
< 0.1%
24 1
< 0.1%
25 1
< 0.1%
26 1
< 0.1%
27 1
< 0.1%
28 1
< 0.1%
29 1
< 0.1%
30 1
< 0.1%
31 1
< 0.1%
ValueCountFrequency (%)
4586417 1
< 0.1%
4586409 1
< 0.1%
4586408 1
< 0.1%
4586407 1
< 0.1%
4586403 1
< 0.1%
4586396 1
< 0.1%
4586395 1
< 0.1%
4586394 1
< 0.1%
4586388 1
< 0.1%
4586386 1
< 0.1%
Distinct1450
Distinct (%)0.1%
Missing11591
Missing (%)0.6%
Memory size14.9 MiB
Sedan
518915 
PASSENGER VEHICLE
416206 
Station Wagon/Sport Utility Vehicle
409896 
SPORT UTILITY / STATION WAGON
180291 
Taxi
 
48192
Other values (1445)
364539 

Length

Max length38
Median length30
Mean length16.938741
Min length1

Characters and Unicode

Total characters32827941
Distinct characters75
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique865 ?
Unique (%)< 0.1%

Sample

1st rowSedan
2nd rowSedan
3rd rowSedan
4th rowSedan
5th rowDump

Common Values

ValueCountFrequency (%)
Sedan 518915
26.6%
PASSENGER VEHICLE 416206
21.3%
Station Wagon/Sport Utility Vehicle 409896
21.0%
SPORT UTILITY / STATION WAGON 180291
 
9.2%
Taxi 48192
 
2.5%
4 dr sedan 40135
 
2.1%
TAXI 31911
 
1.6%
Pick-up Truck 31836
 
1.6%
VAN 25266
 
1.3%
OTHER 22967
 
1.2%
Other values (1440) 212424
10.9%

Length

2022-12-05T16:54:43.879015image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vehicle 836673
18.1%
utility 590216
12.8%
station 590187
12.8%
sedan 561711
12.2%
passenger 416215
9.0%
wagon/sport 409896
8.9%
181483
 
3.9%
wagon 180345
 
3.9%
sport 180291
 
3.9%
taxi 80105
 
1.7%
Other values (876) 590717
12.8%

Most occurring characters

ValueCountFrequency (%)
2693020
 
8.2%
S 2589747
 
7.9%
t 2078915
 
6.3%
E 1816789
 
5.5%
i 1752600
 
5.3%
a 1467890
 
4.5%
e 1455005
 
4.4%
n 1401162
 
4.3%
o 1295948
 
3.9%
T 1130789
 
3.4%
Other values (65) 15146076
46.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 15231229
46.4%
Lowercase Letter 14138588
43.1%
Space Separator 2693020
 
8.2%
Other Punctuation 591426
 
1.8%
Decimal Number 70905
 
0.2%
Dash Punctuation 47541
 
0.1%
Open Punctuation 27615
 
0.1%
Close Punctuation 27613
 
0.1%
Modifier Symbol 2
 
< 0.1%
Other Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2589747
17.0%
E 1816789
11.9%
T 1130789
 
7.4%
I 1051859
 
6.9%
V 909082
 
6.0%
A 874102
 
5.7%
N 865163
 
5.7%
R 723060
 
4.7%
L 667346
 
4.4%
P 654550
 
4.3%
Other values (16) 3948742
25.9%
Lowercase Letter
ValueCountFrequency (%)
t 2078915
14.7%
i 1752600
12.4%
a 1467890
10.4%
e 1455005
10.3%
n 1401162
9.9%
o 1295948
9.2%
l 852499
6.0%
d 608967
 
4.3%
r 570205
 
4.0%
c 542813
 
3.8%
Other values (15) 2112584
14.9%
Decimal Number
ValueCountFrequency (%)
4 53373
75.3%
6 14402
 
20.3%
2 2674
 
3.8%
3 303
 
0.4%
1 47
 
0.1%
5 39
 
0.1%
0 31
 
< 0.1%
9 20
 
< 0.1%
8 9
 
< 0.1%
7 7
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 591403
> 99.9%
. 12
 
< 0.1%
# 4
 
< 0.1%
, 3
 
< 0.1%
' 2
 
< 0.1%
& 1
 
< 0.1%
? 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2693020
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 47541
100.0%
Open Punctuation
ValueCountFrequency (%)
( 27615
100.0%
Close Punctuation
ValueCountFrequency (%)
) 27613
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%
Control
ValueCountFrequency (%)
 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 29369817
89.5%
Common 3458124
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 2589747
 
8.8%
t 2078915
 
7.1%
E 1816789
 
6.2%
i 1752600
 
6.0%
a 1467890
 
5.0%
e 1455005
 
5.0%
n 1401162
 
4.8%
o 1295948
 
4.4%
T 1130789
 
3.9%
I 1051859
 
3.6%
Other values (41) 13329113
45.4%
Common
ValueCountFrequency (%)
2693020
77.9%
/ 591403
 
17.1%
4 53373
 
1.5%
- 47541
 
1.4%
( 27615
 
0.8%
) 27613
 
0.8%
6 14402
 
0.4%
2 2674
 
0.1%
3 303
 
< 0.1%
1 47
 
< 0.1%
Other values (14) 133
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32827940
> 99.9%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2693020
 
8.2%
S 2589747
 
7.9%
t 2078915
 
6.3%
E 1816789
 
5.5%
i 1752600
 
5.3%
a 1467890
 
4.5%
e 1455005
 
4.4%
n 1401162
 
4.3%
o 1295948
 
3.9%
T 1130789
 
3.4%
Other values (64) 15146075
46.1%
Specials
ValueCountFrequency (%)
1
100.0%

VEHICLE TYPE CODE 2
Categorical

HIGH CARDINALITY
MISSING

Distinct1622
Distinct (%)0.1%
Missing353893
Missing (%)18.2%
Memory size14.9 MiB
Sedan
370374 
PASSENGER VEHICLE
318607 
Station Wagon/Sport Utility Vehicle
301180 
SPORT UTILITY / STATION WAGON
140204 
UNKNOWN
81487 
Other values (1617)
383885 

Length

Max length38
Median length30
Mean length16.145074
Min length1

Characters and Unicode

Total characters25763292
Distinct characters72
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique962 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowPick-up Truck
3rd rowSedan
4th rowTractor Truck Diesel
5th rowSedan

Common Values

ValueCountFrequency (%)
Sedan 370374
19.0%
PASSENGER VEHICLE 318607
16.3%
Station Wagon/Sport Utility Vehicle 301180
15.4%
SPORT UTILITY / STATION WAGON 140204
 
7.2%
UNKNOWN 81487
 
4.2%
Taxi 35848
 
1.8%
4 dr sedan 30069
 
1.5%
Pick-up Truck 29118
 
1.5%
TAXI 27702
 
1.4%
Bike 27177
 
1.4%
Other values (1612) 233971
12.0%
(Missing) 353893
18.2%

Length

2022-12-05T16:54:44.350729image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vehicle 628358
17.2%
utility 441402
12.1%
station 441384
12.1%
sedan 402386
11.0%
passenger 318609
8.7%
wagon/sport 301180
8.2%
141364
 
3.9%
wagon 140252
 
3.8%
sport 140204
 
3.8%
unknown 81557
 
2.2%
Other values (920) 625895
17.1%

Most occurring characters

ValueCountFrequency (%)
2079819
 
8.1%
S 1945863
 
7.6%
t 1534332
 
6.0%
E 1435030
 
5.6%
i 1318066
 
5.1%
e 1090694
 
4.2%
a 1076590
 
4.2%
n 1021750
 
4.0%
o 971714
 
3.8%
T 910049
 
3.5%
Other values (62) 12379385
48.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 12465872
48.4%
Lowercase Letter 10615612
41.2%
Space Separator 2079819
 
8.1%
Other Punctuation 442606
 
1.7%
Decimal Number 59094
 
0.2%
Dash Punctuation 46987
 
0.2%
Open Punctuation 26651
 
0.1%
Close Punctuation 26649
 
0.1%
Modifier Symbol 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 1945863
15.6%
E 1435030
11.5%
T 910049
 
7.3%
N 869097
 
7.0%
I 841925
 
6.8%
V 694860
 
5.6%
A 684573
 
5.5%
O 587595
 
4.7%
R 577370
 
4.6%
U 559621
 
4.5%
Other values (16) 3359889
27.0%
Lowercase Letter
ValueCountFrequency (%)
t 1534332
14.5%
i 1318066
12.4%
e 1090694
10.3%
a 1076590
10.1%
n 1021750
9.6%
o 971714
9.2%
l 631114
 
5.9%
r 449414
 
4.2%
d 439383
 
4.1%
c 428223
 
4.0%
Other values (15) 1654332
15.6%
Decimal Number
ValueCountFrequency (%)
4 43049
72.8%
6 13694
 
23.2%
2 1955
 
3.3%
3 265
 
0.4%
0 51
 
0.1%
1 37
 
0.1%
5 27
 
< 0.1%
9 8
 
< 0.1%
8 6
 
< 0.1%
7 2
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 442588
> 99.9%
. 9
 
< 0.1%
' 3
 
< 0.1%
, 2
 
< 0.1%
# 2
 
< 0.1%
? 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2079819
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 46987
100.0%
Open Punctuation
ValueCountFrequency (%)
( 26651
100.0%
Close Punctuation
ValueCountFrequency (%)
) 26649
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 23081484
89.6%
Common 2681808
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 1945863
 
8.4%
t 1534332
 
6.6%
E 1435030
 
6.2%
i 1318066
 
5.7%
e 1090694
 
4.7%
a 1076590
 
4.7%
n 1021750
 
4.4%
o 971714
 
4.2%
T 910049
 
3.9%
N 869097
 
3.8%
Other values (41) 10908299
47.3%
Common
ValueCountFrequency (%)
2079819
77.6%
/ 442588
 
16.5%
- 46987
 
1.8%
4 43049
 
1.6%
( 26651
 
1.0%
) 26649
 
1.0%
6 13694
 
0.5%
2 1955
 
0.1%
3 265
 
< 0.1%
0 51
 
< 0.1%
Other values (11) 100
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25763292
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2079819
 
8.1%
S 1945863
 
7.6%
t 1534332
 
6.0%
E 1435030
 
5.6%
i 1318066
 
5.1%
e 1090694
 
4.2%
a 1076590
 
4.2%
n 1021750
 
4.0%
o 971714
 
3.8%
T 910049
 
3.5%
Other values (62) 12379385
48.1%

VEHICLE TYPE CODE 3
Categorical

HIGH CARDINALITY
MISSING

Distinct230
Distinct (%)0.2%
Missing1817465
Missing (%)93.2%
Memory size14.9 MiB
Sedan
39284 
Station Wagon/Sport Utility Vehicle
31740 
PASSENGER VEHICLE
27713 
SPORT UTILITY / STATION WAGON
13358 
UNKNOWN
 
3283
Other values (225)
16787 

Length

Max length35
Median length30
Mean length17.685234
Min length2

Characters and Unicode

Total characters2337369
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique133 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowSedan

Common Values

ValueCountFrequency (%)
Sedan 39284
 
2.0%
Station Wagon/Sport Utility Vehicle 31740
 
1.6%
PASSENGER VEHICLE 27713
 
1.4%
SPORT UTILITY / STATION WAGON 13358
 
0.7%
UNKNOWN 3283
 
0.2%
4 dr sedan 2561
 
0.1%
Taxi 2053
 
0.1%
Pick-up Truck 1984
 
0.1%
VAN 1366
 
0.1%
OTHER 1045
 
0.1%
Other values (220) 7778
 
0.4%
(Missing) 1817465
93.2%

Length

2022-12-05T16:54:44.826380image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vehicle 59889
18.6%
utility 45100
14.0%
station 45099
14.0%
sedan 42028
13.0%
wagon/sport 31740
9.8%
passenger 27715
8.6%
13429
 
4.2%
sport 13358
 
4.1%
wagon 13358
 
4.1%
truck 3795
 
1.2%
Other values (187) 27081
8.4%

Most occurring characters

ValueCountFrequency (%)
190862
 
8.2%
S 186706
 
8.0%
t 159926
 
6.8%
i 132163
 
5.7%
E 116348
 
5.0%
a 108689
 
4.7%
e 108172
 
4.6%
n 106231
 
4.5%
o 97774
 
4.2%
T 76222
 
3.3%
Other values (51) 1054276
45.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1048847
44.9%
Uppercase Letter 1044394
44.7%
Space Separator 190862
 
8.2%
Other Punctuation 45170
 
1.9%
Decimal Number 3634
 
0.2%
Dash Punctuation 2710
 
0.1%
Open Punctuation 876
 
< 0.1%
Close Punctuation 876
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 186706
17.9%
E 116348
11.1%
T 76222
 
7.3%
I 71388
 
6.8%
N 65699
 
6.3%
V 62915
 
6.0%
A 57887
 
5.5%
U 50119
 
4.8%
W 48475
 
4.6%
O 46568
 
4.5%
Other values (15) 262067
25.1%
Lowercase Letter
ValueCountFrequency (%)
t 159926
15.2%
i 132163
12.6%
a 108689
10.4%
e 108172
10.3%
n 106231
10.1%
o 97774
9.3%
l 64685
6.2%
d 44937
 
4.3%
r 39465
 
3.8%
c 38159
 
3.6%
Other values (14) 148646
14.2%
Decimal Number
ValueCountFrequency (%)
4 2996
82.4%
6 442
 
12.2%
2 184
 
5.1%
3 9
 
0.2%
0 1
 
< 0.1%
1 1
 
< 0.1%
8 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
190862
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 45170
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2710
100.0%
Open Punctuation
ValueCountFrequency (%)
( 876
100.0%
Close Punctuation
ValueCountFrequency (%)
) 876
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2093241
89.6%
Common 244128
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 186706
 
8.9%
t 159926
 
7.6%
i 132163
 
6.3%
E 116348
 
5.6%
a 108689
 
5.2%
e 108172
 
5.2%
n 106231
 
5.1%
o 97774
 
4.7%
T 76222
 
3.6%
I 71388
 
3.4%
Other values (39) 929622
44.4%
Common
ValueCountFrequency (%)
190862
78.2%
/ 45170
 
18.5%
4 2996
 
1.2%
- 2710
 
1.1%
( 876
 
0.4%
) 876
 
0.4%
6 442
 
0.2%
2 184
 
0.1%
3 9
 
< 0.1%
0 1
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2337369
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
190862
 
8.2%
S 186706
 
8.0%
t 159926
 
6.8%
i 132163
 
5.7%
E 116348
 
5.0%
a 108689
 
4.7%
e 108172
 
4.6%
n 106231
 
4.5%
o 97774
 
4.2%
T 76222
 
3.3%
Other values (51) 1054276
45.1%

VEHICLE TYPE CODE 4
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct91
Distinct (%)0.3%
Missing1920206
Missing (%)98.5%
Memory size14.9 MiB
Sedan
9375 
Station Wagon/Sport Utility Vehicle
7627 
PASSENGER VEHICLE
5969 
SPORT UTILITY / STATION WAGON
2852 
UNKNOWN
 
595
Other values (86)
3006 

Length

Max length35
Median length30
Mean length17.954051
Min length2

Characters and Unicode

Total characters528280
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique37 ?
Unique (%)0.1%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowSedan
3rd rowStation Wagon/Sport Utility Vehicle
4th rowSedan
5th rowSedan

Common Values

ValueCountFrequency (%)
Sedan 9375
 
0.5%
Station Wagon/Sport Utility Vehicle 7627
 
0.4%
PASSENGER VEHICLE 5969
 
0.3%
SPORT UTILITY / STATION WAGON 2852
 
0.1%
UNKNOWN 595
 
< 0.1%
4 dr sedan 566
 
< 0.1%
Pick-up Truck 418
 
< 0.1%
Taxi 408
 
< 0.1%
VAN 242
 
< 0.1%
OTHER 189
 
< 0.1%
Other values (81) 1183
 
0.1%
(Missing) 1920206
98.5%

Length

2022-12-05T16:54:45.388119image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vehicle 13652
18.9%
station 10479
14.5%
utility 10479
14.5%
sedan 9984
13.8%
wagon/sport 7627
10.6%
passenger 5969
8.3%
2858
 
4.0%
sport 2852
 
3.9%
wagon 2852
 
3.9%
truck 686
 
0.9%
Other values (91) 4780
 
6.6%

Most occurring characters

ValueCountFrequency (%)
42850
 
8.1%
S 42513
 
8.0%
t 38318
 
7.3%
i 31472
 
6.0%
a 25823
 
4.9%
e 25619
 
4.8%
n 25355
 
4.8%
E 24660
 
4.7%
o 23274
 
4.4%
T 15902
 
3.0%
Other values (47) 232494
44.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 248525
47.0%
Uppercase Letter 224914
42.6%
Space Separator 42850
 
8.1%
Other Punctuation 10485
 
2.0%
Decimal Number 725
 
0.1%
Dash Punctuation 553
 
0.1%
Open Punctuation 114
 
< 0.1%
Close Punctuation 114
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 42513
18.9%
E 24660
11.0%
T 15902
 
7.1%
I 15042
 
6.7%
V 14124
 
6.3%
N 13718
 
6.1%
A 12211
 
5.4%
U 11365
 
5.1%
W 11085
 
4.9%
O 9648
 
4.3%
Other values (14) 54646
24.3%
Lowercase Letter
ValueCountFrequency (%)
t 38318
15.4%
i 31472
12.7%
a 25823
10.4%
e 25619
10.3%
n 25355
10.2%
o 23274
9.4%
l 15452
6.2%
d 10617
 
4.3%
r 9095
 
3.7%
c 8819
 
3.5%
Other values (13) 34681
14.0%
Decimal Number
ValueCountFrequency (%)
4 622
85.8%
6 58
 
8.0%
2 42
 
5.8%
3 2
 
0.3%
5 1
 
0.1%
Space Separator
ValueCountFrequency (%)
42850
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 10485
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 553
100.0%
Open Punctuation
ValueCountFrequency (%)
( 114
100.0%
Close Punctuation
ValueCountFrequency (%)
) 114
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 473439
89.6%
Common 54841
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 42513
 
9.0%
t 38318
 
8.1%
i 31472
 
6.6%
a 25823
 
5.5%
e 25619
 
5.4%
n 25355
 
5.4%
E 24660
 
5.2%
o 23274
 
4.9%
T 15902
 
3.4%
l 15452
 
3.3%
Other values (37) 205051
43.3%
Common
ValueCountFrequency (%)
42850
78.1%
/ 10485
 
19.1%
4 622
 
1.1%
- 553
 
1.0%
( 114
 
0.2%
) 114
 
0.2%
6 58
 
0.1%
2 42
 
0.1%
3 2
 
< 0.1%
5 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 528280
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
42850
 
8.1%
S 42513
 
8.0%
t 38318
 
7.3%
i 31472
 
6.0%
a 25823
 
4.9%
e 25619
 
4.8%
n 25355
 
4.8%
E 24660
 
4.7%
o 23274
 
4.4%
T 15902
 
3.0%
Other values (47) 232494
44.0%

VEHICLE TYPE CODE 5
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct63
Distinct (%)0.8%
Missing1941717
Missing (%)99.6%
Memory size14.9 MiB
Sedan
2602 
Station Wagon/Sport Utility Vehicle
2145 
PASSENGER VEHICLE
1487 
SPORT UTILITY / STATION WAGON
802 
Pick-up Truck
 
137
Other values (58)
740 

Length

Max length35
Median length30
Mean length18.219133
Min length2

Characters and Unicode

Total characters144168
Distinct characters54
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)0.3%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowStation Wagon/Sport Utility Vehicle

Common Values

ValueCountFrequency (%)
Sedan 2602
 
0.1%
Station Wagon/Sport Utility Vehicle 2145
 
0.1%
PASSENGER VEHICLE 1487
 
0.1%
SPORT UTILITY / STATION WAGON 802
 
< 0.1%
Pick-up Truck 137
 
< 0.1%
4 dr sedan 123
 
< 0.1%
Taxi 98
 
< 0.1%
UNKNOWN 94
 
< 0.1%
VAN 50
 
< 0.1%
OTHER 49
 
< 0.1%
Other values (53) 326
 
< 0.1%
(Missing) 1941717
99.6%

Length

2022-12-05T16:54:45.863523image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vehicle 3641
18.5%
station 2947
15.0%
utility 2947
15.0%
sedan 2739
13.9%
wagon/sport 2145
10.9%
passenger 1487
7.6%
804
 
4.1%
wagon 804
 
4.1%
sport 802
 
4.1%
truck 222
 
1.1%
Other values (57) 1129
 
5.7%

Most occurring characters

ValueCountFrequency (%)
11764
 
8.2%
S 11517
 
8.0%
t 10785
 
7.5%
i 8862
 
6.1%
a 7182
 
5.0%
e 7138
 
5.0%
n 7077
 
4.9%
o 6568
 
4.6%
E 6124
 
4.2%
T 4465
 
3.1%
Other values (44) 62686
43.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 69752
48.4%
Uppercase Letter 59321
41.1%
Space Separator 11764
 
8.2%
Other Punctuation 2949
 
2.0%
Dash Punctuation 175
 
0.1%
Decimal Number 161
 
0.1%
Close Punctuation 23
 
< 0.1%
Open Punctuation 23
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 11517
19.4%
E 6124
10.3%
T 4465
 
7.5%
I 4007
 
6.8%
V 3746
 
6.3%
N 3428
 
5.8%
A 3209
 
5.4%
U 3114
 
5.2%
W 3046
 
5.1%
O 2624
 
4.4%
Other values (13) 14041
23.7%
Lowercase Letter
ValueCountFrequency (%)
t 10785
15.5%
i 8862
12.7%
a 7182
10.3%
e 7138
10.2%
n 7077
10.1%
o 6568
9.4%
l 4347
6.2%
d 2884
 
4.1%
r 2557
 
3.7%
c 2538
 
3.6%
Other values (12) 9814
14.1%
Decimal Number
ValueCountFrequency (%)
4 133
82.6%
2 14
 
8.7%
6 13
 
8.1%
3 1
 
0.6%
Space Separator
ValueCountFrequency (%)
11764
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2949
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 175
100.0%
Close Punctuation
ValueCountFrequency (%)
) 23
100.0%
Open Punctuation
ValueCountFrequency (%)
( 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 129073
89.5%
Common 15095
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 11517
 
8.9%
t 10785
 
8.4%
i 8862
 
6.9%
a 7182
 
5.6%
e 7138
 
5.5%
n 7077
 
5.5%
o 6568
 
5.1%
E 6124
 
4.7%
T 4465
 
3.5%
l 4347
 
3.4%
Other values (35) 55008
42.6%
Common
ValueCountFrequency (%)
11764
77.9%
/ 2949
 
19.5%
- 175
 
1.2%
4 133
 
0.9%
) 23
 
0.2%
( 23
 
0.2%
2 14
 
0.1%
6 13
 
0.1%
3 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 144168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11764
 
8.2%
S 11517
 
8.0%
t 10785
 
7.5%
i 8862
 
6.1%
a 7182
 
5.0%
e 7138
 
5.0%
n 7077
 
4.9%
o 6568
 
4.6%
E 6124
 
4.2%
T 4465
 
3.1%
Other values (44) 62686
43.5%

Interactions

2022-12-05T16:52:32.840732image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:27.581165image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:38.818674image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:47.737872image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:56.310459image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:04.985072image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:13.953150image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:23.896982image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:33.948489image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:28.721865image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:39.890808image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:48.772991image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:57.283046image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:06.117530image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:15.044420image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:24.961428image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:35.247355image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:29.773107image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:41.098771image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:49.935868image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:58.377251image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:07.392576image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:16.322217image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:26.195599image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:36.434890image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:30.761837image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:42.209878image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:51.000252image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:59.413569image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:08.744920image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:17.664243image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:27.324423image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:37.594968image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:31.842995image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:43.366823image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:52.129608image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:00.569699image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:09.841138image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:19.066279image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:28.388254image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:38.728247image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:35.700564image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:44.461494image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:53.216439image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:01.679476image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:10.855847image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:20.283153image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:29.421223image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:39.742897image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:36.752665image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:45.536986image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:54.266484image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:02.749055image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:11.881243image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:21.462751image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:30.719759image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:40.714648image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:37.857223image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:46.597682image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:51:55.347260image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:03.836824image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:12.911714image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:22.659040image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-05T16:52:31.710750image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-12-05T16:54:46.308013image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-12-05T16:54:47.331687image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-12-05T16:54:48.110156image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-12-05T16:54:49.152342image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-12-05T16:54:49.936778image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-12-05T16:54:50.721971image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-12-05T16:52:54.853441image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-05T16:53:13.846622image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-12-05T16:54:10.018595image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
009/11/20212:39NaNNaNNaNNaNNaNWHITESTONE EXPRESSWAY20 AVENUENaN2.00.0000020Aggressive Driving/Road RageUnspecifiedNaNNaNNaN4455765SedanSedanNaNNaNNaN
103/26/202211:45NaNNaNNaNNaNNaNQUEENSBORO BRIDGE UPPERNaNNaN1.00.0000010Pavement SlipperyNaNNaNNaNNaN4513547SedanNaNNaNNaNNaN
206/29/20226:55NaNNaNNaNNaNNaNTHROGS NECK BRIDGENaNNaN0.00.0000000Following Too CloselyUnspecifiedNaNNaNNaN4541903SedanPick-up TruckNaNNaNNaN
309/11/20219:35BROOKLYN11208.040.667202-73.866500(40.667202, -73.8665)NaNNaN1211 LORING AVENUE0.00.0000000UnspecifiedNaNNaNNaNNaN4456314SedanNaNNaNNaNNaN
412/14/20218:13BROOKLYN11233.040.683304-73.917274(40.683304, -73.917274)SARATOGA AVENUEDECATUR STREETNaN0.00.0000000NaNNaNNaNNaNNaN4486609NaNNaNNaNNaNNaN
504/14/202112:47NaNNaNNaNNaNNaNMAJOR DEEGAN EXPRESSWAY RAMPNaNNaN0.00.0000000UnspecifiedUnspecifiedNaNNaNNaN4407458DumpSedanNaNNaNNaN
612/14/202117:05NaNNaN40.709183-73.956825(40.709183, -73.956825)BROOKLYN QUEENS EXPRESSWAYNaNNaN0.00.0000000Passing Too CloselyUnspecifiedNaNNaNNaN4486555SedanTractor Truck DieselNaNNaNNaN
712/14/20218:17BRONX10475.040.868160-73.831480(40.86816, -73.83148)NaNNaN344 BAYCHESTER AVENUE2.00.0000020UnspecifiedUnspecifiedNaNNaNNaN4486660SedanSedanNaNNaNNaN
812/14/202121:10BROOKLYN11207.040.671720-73.897100(40.67172, -73.8971)NaNNaN2047 PITKIN AVENUE0.00.0000000Driver InexperienceUnspecifiedNaNNaNNaN4487074SedanNaNNaNNaNNaN
912/14/202114:58MANHATTAN10017.040.751440-73.973970(40.75144, -73.97397)3 AVENUEEAST 43 STREETNaN0.00.0000000Passing Too CloselyUnspecifiedNaNNaNNaN4486519SedanStation Wagon/Sport Utility VehicleNaNNaNNaN
CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
194962011/28/20229:20BROOKLYN11206.040.699790-73.950096(40.69979, -73.950096)MARCY AVENUEFLUSHING AVENUENaN0.00.0000000Passing or Lane Usage ImproperDriver Inattention/DistractionUnspecifiedNaNNaN4586261Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleBusNaNNaN
194962111/29/202221:41QUEENS11354.040.764650-73.823494(40.76465, -73.823494)NORTHERN BOULEVARDPARSONS BOULEVARDNaN1.00.0000010Failure to Yield Right-of-WayUnspecifiedNaNNaNNaN4585945Station Wagon/Sport Utility VehicleMotorcycleNaNNaNNaN
194962211/29/202213:05QUEENS11434.040.667522-73.780630(40.667522, -73.78063)NORTH CONDUIT AVENUEROCKAWAY BOULEVARDNaN1.00.0000010Unsafe Lane ChangingUnsafe SpeedNaNNaNNaN4586024Station Wagon/Sport Utility VehicleSedanNaNNaNNaN
194962311/13/202214:45NaNNaNNaNNaNNaNTRIBOROUGH BRIDGENaNNaN5.00.0000050Following Too CloselyFollowing Too CloselyUnspecifiedNaNNaN4586350SedanStation Wagon/Sport Utility VehicleTaxiNaNNaN
194962411/29/202215:22NaNNaN40.630820-73.886360(40.63082, -73.88636)ROCKAWAY PARKWAYSHORE PARKWAYNaN2.00.0000020UnspecifiedUnspecifiedUnspecifiedNaNNaN4586083Station Wagon/Sport Utility VehicleTaxiStation Wagon/Sport Utility VehicleNaNNaN
194962511/29/20222:20STATEN ISLAND10305.040.611940-74.070380(40.61194, -74.07038)NaNNaN255 HYLAN BOULEVARD0.00.0000000UnspecifiedNaNNaNNaNNaN4585934SedanNaNNaNNaNNaN
194962611/29/202215:05BROOKLYN11220.040.639854-74.012200(40.639854, -74.0122)57 STREET6 AVENUENaN0.00.0000000Driver Inattention/DistractionUnspecifiedNaNNaNNaN4586337SedanDistributoNaNNaNNaN
194962711/24/202222:00NaNNaN40.812073-73.936040(40.812073, -73.93604)EAST 135 STREETMADISON AVENUENaN0.00.0000000UnspecifiedUnspecifiedNaNNaNNaN4586345Station Wagon/Sport Utility VehicleNaNNaNNaNNaN
194962810/18/202215:00NaNNaN40.797035-73.929825(40.797035, -73.929825)EAST 120 STREETPLEASANT AVENUENaN0.00.0000000UnspecifiedNaNNaNNaNNaN4586360Station Wagon/Sport Utility VehicleNaNNaNNaNNaN
194962911/29/20226:25BROOKLYN11210.040.625275-73.946610(40.625275, -73.94661)NaNNaN2442 NOSTRAND AVENUE0.00.0000000UnspecifiedUnspecifiedUnspecifiedNaNNaN4585982SedanStation Wagon/Sport Utility VehicleSedanNaNNaN